Complex Corpus Annotation: The Prague Dependency Treebank

نویسنده

  • Jan Hajič
چکیده

The Prague Dependency Treebank (Hajič et al., 2001) is approaching the publication of its second version in which the tectogrammatical annotation is being added to the morphological and analytical (surface-syntactic) one. In this article, the Prague Dependency Treebank as a whole is being described, including its brief history. In this volume, there are three more papers with a detailed account of some of the most recently tackled phenomena occurring at the tectogrammatical level of annotation (Panevová and Lopatková, 2004, Cinková and Kolářová, 2004, and Urešová, 2004).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...

متن کامل

Annotation Procedure in Building the Prague Czech-English Dependency Treebank

In this paper, we present some organizational aspects of building of a large corpus with rich linguistic annotation, while Prague Czech-English Dependency Treebank (PCEDT) serves as an example. We stress the necessity to divide the annotation process into several well planed phases. We present a system of automatic checking of the correctness of the annotation and describe several ways to measu...

متن کامل

Coreference in Prague Czech-English Dependency Treebank

We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as the coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Onton...

متن کامل

Prague Dependency Treebank Annotation Errors: A Preliminary Analysis

This paper presents a basic analysis of syntactic annotation errors and inconsistencies in the Prague Dependency Treebank, the biggest corpus of Czech with manual syntactic annotation. The corpus is used for developing and testing of many syntactic analysers of Czech and the problems in the annotation have an essential impact on the evaluation of the quality of these parsers and the results of ...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005